Fix Azure IMDS Url in InstanceMetadataService initialization #4600

whites11 · 2022-01-12T09:02:05Z

Which component this PR applies to?

cluster-autoscaler

What type of PR is this?

/kind bug

What this PR does / why we need it:

In CA version 1.21, on Azure, it was possible to omit the Subscription ID completely.
On startup, cluster autoscaler used to detect it using Azure Instance Metadata Service (IMDS).

It used to do so using a service in legacy-cloud-providers.

The initialization code for such service is as follows:

metadataService, err := providerazure.NewInstanceMetadataService(metadataURL)

In 1.21 this worked just fine.

In 1.22 the meaning of the metadataURL parameter was changed in the service code.
What used to be a full URL including path in the version used in 1.21 http://169.254.169.254/metadata/instance was changed to a root path in the version used in 1.22.

This change was not picked up in CA code and this causes CA to panic on startup when the Subscription ID is not provided:

F0112 08:57:17.405106       1 azure_cloud_provider.go:167] Failed to create Azure Manager: failure of getting instance metadata with response "404 Not Found"

Because the computed IMDS URL contains the path twice:

http://169.254.169.254/metadata/instance/metadata/instance

obviously ending up with a 404 from the IMDS endpoint.

This easy PR fixes just that by passing the correct value to the NewInstanceMetadataService function.

Which issue(s) this PR fixes:

None that I could find.

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Fixed a panic in Cluster Autoscaler on Azure when no Subscription ID was provided by the user.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot · 2022-01-12T09:02:14Z

Welcome @whites11!

It looks like this is your first PR to kubernetes/autoscaler 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes/autoscaler has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

whites11 · 2022-01-12T09:15:29Z

The failing test sounds like a flake to me, or at least I doubt my change introduced it.

marwanad · 2022-01-12T20:46:53Z

Seems like the storage account team disabled an endpoint that was previously used for testing. You can fix the unit test by cherry-picking: #4594.

@nilo19 can you see if there's value in that unit test given the above and remove it from all supported branches if not?

marwanad · 2022-01-12T20:47:34Z

This PR looks good for me.

@whites11 I think we have removed IMDS auth in latest CA master - do you see value in bringing that back?

whites11 · 2022-01-12T21:28:11Z

This PR looks good for me.

@whites11 I think we have removed IMDS auth in latest CA master - do you see value in bringing that back?

I totally do. I have a pr in the works to add it back.

nilo19 · 2022-01-13T02:56:15Z

@whites11 let's disable the flaky test before completely fixing it.

whites11 · 2022-01-13T08:08:19Z

@whites11 let's disable the flaky test before completely fixing it.

done

feiskyer

Thanks for the fix.

/lgtm
/approve

k8s-ci-robot · 2022-01-13T10:08:13Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: feiskyer, whites11

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~cluster-autoscaler/cloudprovider/azure/OWNERS~~ [feiskyer]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

whites11 · 2022-01-13T10:09:47Z

Thanks for the fix.

/lgtm /approve

Thanks for quick review&merge.
How do I go about asking for a 1.22.3 release?

feiskyer · 2022-01-13T10:11:22Z

+@MaciekPytel what's our plans for next patch release? is it possible to cut that by request?

MaciekPytel · 2022-01-17T12:40:47Z

Usually we cut them when requested via sig. Probably the best way to request is by adding an item to sig-meeting agenda.

How critical is this fix? We just did patch releases in late December, doing another one right now would be a bit soon. But if it's critical than we can bring it up in today sig meeting and set the cut date for next week (to give other providers some time to get their patches in).

whites11 · 2022-01-17T12:52:00Z

Usually we cut them when requested via sig. Probably the best way to request is by adding an item to sig-meeting agenda.

How critical is this fix? We just did patch releases in late December, doing another one right now would be a bit soon. But if it's critical than we can bring it up in today sig meeting and set the cut date for next week (to give other providers some time to get their patches in).

Hey @MaciekPytel thanks a lot for your answer.
I don't know how you define "critical".
There is a use case (not sure how common it is, but it is the use case we have at Giant Swarm) that is broken (cluster autoscaler crash looping).
There are workarounds, sure, but from our point of view this is critical.

Will add an entry to the agenda and see what happens.

Thanks again!

…/autoscaler#4600

…/autoscaler#4600 (#150)

k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jan 12, 2022

k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Jan 12, 2022

k8s-ci-robot requested review from feiskyer and nilo19 January 12, 2022 09:02

whites11 mentioned this pull request Jan 12, 2022

Azure v17.0.0-alpha1 release giantswarm/roadmap#520

Closed

24 tasks

jbartosik added the area/cluster-autoscaler label Jan 12, 2022

Fix Azure IMDS Url in InstanceMetadataService initialization

c1397c5

k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 13, 2022

feiskyer reviewed Jan 13, 2022

View reviewed changes

k8s-ci-robot assigned feiskyer Jan 13, 2022

k8s-ci-robot added lgtm "Looks good to me", indicates that a PR is ready to be merged. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Jan 13, 2022

k8s-ci-robot merged commit 1c2491d into kubernetes:cluster-autoscaler-release-1.22 Jan 13, 2022

whites11 deleted the azure-metadata-fix branch January 13, 2022 10:09

MaciekPytel mentioned this pull request Jan 17, 2022

Azure unittest failing on master #4592

Closed

fullykubed mentioned this pull request Jan 21, 2022

Specifying ARM_SUBSCRIPTION_ID is now mandatory when using the cluster-autoscaler on AKS with MSI auth #4635

Closed

whites11 mentioned this pull request Feb 23, 2022

Default Azure subscription ID if not provided using instance metadata #4708

Closed

whites11 added a commit to giantswarm/cluster-autoscaler-app that referenced this pull request Mar 8, 2022

Use GS-built 1.22 image to deliver upstream unreleased fix kubernetes…

0856e9b

…/autoscaler#4600

whites11 mentioned this pull request Mar 8, 2022

Use GS-built 1.22 image to deliver upstream unreleased fix https://gi… giantswarm/cluster-autoscaler-app#150

Merged

whites11 added a commit to giantswarm/cluster-autoscaler-app that referenced this pull request Mar 8, 2022

Use GS-built 1.22 image to deliver upstream unreleased fix kubernetes…

59a23f0

…/autoscaler#4600 (#150)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Azure IMDS Url in InstanceMetadataService initialization #4600

Fix Azure IMDS Url in InstanceMetadataService initialization #4600

whites11 commented Jan 12, 2022

k8s-ci-robot commented Jan 12, 2022

whites11 commented Jan 12, 2022 •

edited

Loading

marwanad commented Jan 12, 2022

marwanad commented Jan 12, 2022

whites11 commented Jan 12, 2022

nilo19 commented Jan 13, 2022 •

edited

Loading

whites11 commented Jan 13, 2022

feiskyer left a comment

k8s-ci-robot commented Jan 13, 2022

whites11 commented Jan 13, 2022

feiskyer commented Jan 13, 2022

MaciekPytel commented Jan 17, 2022

whites11 commented Jan 17, 2022

Fix Azure IMDS Url in InstanceMetadataService initialization #4600

Fix Azure IMDS Url in InstanceMetadataService initialization #4600

Conversation

whites11 commented Jan 12, 2022

Which component this PR applies to?

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

k8s-ci-robot commented Jan 12, 2022

whites11 commented Jan 12, 2022 • edited Loading

marwanad commented Jan 12, 2022

marwanad commented Jan 12, 2022

whites11 commented Jan 12, 2022

nilo19 commented Jan 13, 2022 • edited Loading

whites11 commented Jan 13, 2022

feiskyer left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Jan 13, 2022

whites11 commented Jan 13, 2022

feiskyer commented Jan 13, 2022

MaciekPytel commented Jan 17, 2022

whites11 commented Jan 17, 2022

whites11 commented Jan 12, 2022 •

edited

Loading

nilo19 commented Jan 13, 2022 •

edited

Loading